GPU Support by kaushikcfd · Pull Request #574 · OP2/PyOP2

kaushikcfd · 2020-02-11T00:30:50Z

Draft because:

Clean commit history

kaushikcfd · 2020-02-11T00:32:09Z

miklos1 · 2020-08-04T12:51:52Z

pyop2/base.py

+    """
+    AVAILABLE_ON_HOST_ONLY = 1
+    AVAILABLE_ON_DEVICE_ONLY = 2
+    AVAILABLE_ON_BOTH = 3


Maybe IntFlag instead of IntEnum? (just something to perhaps consider...)

pyop2/backends/__init__.py

pyop2/backends/cuda.py

pyop2/compilation.py

pyop2/configuration.py

pyop2/offload_utils.py

pyop2/transforms/gpu_utils.py

pyop2/backends/cpu.py

wence-

Didn't manage everything, but a bunch of comments/queries.

pyop2/backends/cuda.py

pyop2/types/dat.py

pyop2/types/dataset.py

pyop2/transforms/snpt.py

wence- · 2022-07-12T12:41:17Z

pyop2/transforms/snpt.py

+    return arg
+
+
+def snpt_transform(kernel, block_size):


Completely impenetrable name.

Renamed it to split_n_across_workgroups, better?

wence- · 2022-07-12T12:55:38Z

pyop2/backends/cuda.py

+            if self.can_be_represented_as_petscvec():
+                # report to petsc that we are altering data on the CPU
+                with self.vec as petscvec:
+                    petscvec.array_w


Again, comments about vec state management apply here, and mutatis mutandis, the rest of the PR.

wence- · 2022-07-12T12:58:59Z

pyop2/backends/cuda.py

+        grid = tuple(int(evaluate(glens[i], parameters)) if i < len(glens) else 1
+                     for i in range(2))
+        block = tuple(int(evaluate(llens[i], parameters)) if i < len(llens) else 1
+                      for i in range(3))


This cached data is generated by a computer. Therefore there should be no need to parse and evaluate it, because the on-disk representation can just be the in-memory representation.

This cached data is generated by a computer

I preferred a human-readable format to keep the representation valid across pyop2-versions which might not be the case with pickling. (Happy to serialize in a different format if there's a better alternative)

no need to parse and evaluate it,

I don't think we can avoid the evaluates because the grid sizes are typically symbolic expressions of the form: (end - start) // 32 + 1. Therefore the grid sizes are evaluated at runtime during a GlobalKernel.__call__ invocation.

Pickle version incompatibilities are fine (they would appear when you upgrade python and we don't guarantee very much about the portability of on disk cached data between firedrake-update calls, let alone python version upgrades).

What you're attempting to do is cache a function that computes these things. So in the code_to_compile case, just write that function, and pickle it.

Although is this a consequence of the on-disk kernel/compiled code not remembering the loopy representation?

Pickle version incompatibilities are fin

Cool, will change. (Thanks!)

Although is this a consequence of the on-disk kernel/compiled code not remembering the loopy representation?

Yes, the compiled code stores only the plain CUDA/OpenCL device kernel. Loopy's cache on disk mechanism works by storing both the device kernel and the host code that would call end up calling this device kernel. But adapting PyOP2 to call loopy kernels would be a bit disruptive and I think loopy's cache dir being $XDG_CACHE_DIR has run into some problems in the past.

just write that function,

Planning to simply generate a python function corresponding to the expression: inducer/pymbolic#107

For my education, how come you chose to not go with the generated invokers?

For PyOpenCL we could borrow the invoker, but for PyCUDA/C-target we don't have an invoker in loopy (yet).

Also, for the pyopencl invoker, a minor concern is that the disk-caching semantics don't play well with PyOP2's disk-caching semantics.

I feel adding an invoker for the PyCUDA target shouldn't be too difficult. Taking a look at it.

There is invoker generation for the C target. I'd expect that putting together CUDA invokers wouldn't be a massive effort based on a copy-paste job of the CL ones. Do you think that might be the right approach?

I am working on integrating the PyCUDA/PyOpenCL invokers. For the C-target, reorganizing pyop2/compilation.py is a bigger undertaking and also out of the scope for this PR.

pyop2/backends/cuda.py

wence- · 2022-07-12T13:04:55Z

pyop2/backends/cuda.py

+                                                                 .callables_table))
+            f.write("("+",".join(str(glen) for glen in glens)+",)")
+            f.write("\n")
+            f.write("("+",".join(str(llen) for llen in llens)+",)")


This is data to be consumed by a computer, so don't make it human-readable and require parsing.

Do you recommend pickling?

wence- · 2022-07-12T13:08:18Z

pyop2/backends/cuda.py

+        fpath = self.extra_args_cache_file_path
+        extra_args_np = [arg
+                         for _, arg in natsorted(numpy.load(fpath).items(),
+                                                 key=lambda x: x[0])]


This is tremendously fragile. Please think of a better way of saving this data in a way that doesn't rely on dubiously-stable properties of numpy.savez/load.

Agreed. While np.savez-ing the arrays also stored a string-array that tells us the keys that were used for storing these arrays. See https://github.com/OP2/PyOP2/compare/9b7eb97878f2b6609d64affe896c5d4d901a2934..0e58f08d3ea1487e01c45df2270fa9e5790a845d.

connorjward · 2024-10-22T15:43:55Z

@kaushikcfd just to make you aware of a couple things:

We are currently in the process of ingesting PyOP2 (and TSFC) into the Firedrake repository, so this repository will shortly be archived.
We are hoping to soon drop PyOP2 entirely in favour of pyop3. We then, assuming funding, want to add GPU support there. We fully intend to use a lot of what you have done here but will have to adapt it a fair amount.

kaushikcfd · 2024-11-01T16:16:36Z

Hi Connor!
Sounds good.

We fully intend to use a lot of what you have done here but will have to adapt it a fair amount.

Let me know if anything in this patch isn't clear.

connorjward · 2025-05-16T17:50:04Z

@kaushikcfd I just read the GPU chapter of your thesis (and liked it!). Looking through the code here I can’t find the second transform that you describe nor the various heuristics. Are those available anywhere?

kaushikcfd · 2025-05-16T17:59:21Z

Are those available anywhere?

You can find them here: https://github.com/OP2/PyOP2/tree/auto_tiling.

connorjward · 2025-05-16T18:00:53Z

Ah great. Thanks!

kaushikcfd requested a review from wence- February 11, 2020 00:30

kaushikcfd force-pushed the gpu branch from a5eb20a to dd7717a Compare April 23, 2020 16:44

kaushikcfd force-pushed the gpu branch from 2552ccb to 72fa0a9 Compare May 1, 2020 15:53

wence- marked this pull request as draft May 13, 2020 10:20

miklos1 reviewed Aug 4, 2020

View reviewed changes

kaushikcfd force-pushed the gpu branch from bea2af2 to b75090a Compare June 24, 2022 13:55

kaushikcfd force-pushed the gpu branch from 1381d88 to 754fa03 Compare July 12, 2022 02:31

kaushikcfd marked this pull request as ready for review July 12, 2022 02:37

kaushikcfd changed the title ~~WIP: GPU Suppport~~ GPU Suppport Jul 12, 2022

kaushikcfd requested review from JDBetteridge, connorjward and sv2518 July 12, 2022 02:38

kaushikcfd mentioned this pull request Jul 12, 2022

GPU support for Firedrake firedrakeproject/firedrake#1605

Open

2 tasks

connorjward reviewed Jul 12, 2022

View reviewed changes

connorjward changed the title ~~GPU Suppport~~ GPU Support Jul 12, 2022

wence- reviewed Jul 12, 2022

View reviewed changes

pyop2/backends/cpu.py Outdated Show resolved Hide resolved

wence- reviewed Jul 12, 2022

View reviewed changes

pyop2/backends/cpu.py Outdated Show resolved Hide resolved

wence- requested changes Jul 12, 2022

View reviewed changes

kaushikcfd force-pushed the gpu branch 11 times, most recently from 3f834d3 to 5810a41 Compare July 14, 2022 20:47

kaushikcfd requested a review from wence- July 19, 2022 16:38

kaushikcfd force-pushed the gpu branch 2 times, most recently from 16720f6 to 96f54ce Compare July 29, 2022 02:27

kaushikcfd mentioned this pull request Aug 1, 2022

Use MirroredArray instead of numpy arrays to reduce boilerplate #671

Closed

inducer mentioned this pull request Aug 2, 2022

Implements ToPythonASTMapper inducer/pymbolic#107

Merged

kaushikcfd force-pushed the gpu branch 3 times, most recently from 5bed614 to 3df2554 Compare November 18, 2022 00:16

kaushikcfd force-pushed the gpu branch from 3df2554 to c57d6ca Compare March 30, 2023 21:55

kaushikcfd force-pushed the gpu branch 2 times, most recently from 98c560b to f58780c Compare June 4, 2024 16:59

kaushikcfd force-pushed the gpu branch from f58780c to 740568e Compare June 24, 2024 09:01

kaushikcfd force-pushed the gpu branch 2 times, most recently from f521085 to fc1575e Compare September 13, 2024 18:21

kaushikcfd added 10 commits September 21, 2024 22:37

abstract functionality of pyop2 types to be overloaded by backends

ce27e25

generalize loopy codegen to allow OpenCL/CUDA targets

4129f37

defines an AbstractComputeBackend type

e4bc3eb

implements CPU backend

edc4124

defines offloading helper types

03db3a5

implement GPU codegen helpers

8903a46

Implements CUDA backend

2189c39

Implements OpenCL backend

e46b155

adds a CI job: test_opencl

e403003

adds test_opencl_offloading

e105f5e

kaushikcfd force-pushed the gpu branch from fc1575e to e105f5e Compare September 22, 2024 03:53

Conversation

kaushikcfd commented Feb 11, 2020 • edited by JDBetteridge Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kaushikcfd commented Feb 11, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wence- left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kaushikcfd Jul 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kaushikcfd Aug 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

connorjward commented Oct 22, 2024

Uh oh!

kaushikcfd commented Nov 1, 2024

Uh oh!

connorjward commented May 16, 2025

Uh oh!

kaushikcfd commented May 16, 2025

Uh oh!

connorjward commented May 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

kaushikcfd commented Feb 11, 2020 •

edited by JDBetteridge

Loading

kaushikcfd Jul 13, 2022 •

edited

Loading

kaushikcfd Aug 3, 2022 •

edited

Loading